Model Selection

Knowledge Distillation

# Knowledge Distillation

Openr1 Distill 7B

OpenR1-Distill-7B is a post-trained version of Qwen2.5-Math-7B on the Mixture-of-Thoughts dataset, designed to teach language models step-by-step reasoning.

Large Language Model

Transformers English

Unime LLaVA 1.6 7B

UniME is a general embedding learning model based on a multimodal large model, trained with 336×336 image resolution and ranked first on the MMEB leaderboard.

Transformers English

Unime Phi3.5 V 4.2B

UniME is a general embedding learning model based on a multimodal large model, focusing on breaking down modal barriers to achieve cross-modal retrieval and embedding learning.

Multimodal Alignment

Transformers English

Splade Disco Human Mistral

A conversational search model improved based on SPLADE++, optimized for multi-turn dialogue query semantic understanding through multi-teacher distillation strategy

Text Embedding English

Splade Disco Human

A conversational search version adapted from the SPLADE++ model, fine-tuning the query encoder on the QReCC dataset to optimize multi-turn conversational search performance.

Text Embedding English

MiniMaid-L2 is a role-play specialized model further optimized from MiniMaid-L1, achieving outstanding performance among 3B-scale models through knowledge distillation and training on a larger dataset.

Large Language Model

Transformers English

Distill Any Depth Small Hf

Distill-Any-Depth is a SOTA monocular depth estimation model trained based on knowledge distillation algorithms, capable of efficient and accurate depth estimation.

ARWKV-R1-1B5 is an early preview version of a 7-billion-parameter model based on RNN, trained through three-stage knowledge distillation from DeepSeek-R1-Distill-Qwen-1.5B, with a context length of 2k.

Large Language Model

Transformers Supports Multiple Languages

Deepseer R1 Vision Distill Qwen 1.5B Google Vit Base Patch16 224

DeepSeer is a vision-language model developed based on the DeepSeek-R1 model, supporting chain-of-thought reasoning and trained through dialogue templates for visual models.

mehmetkeremturkcan

Qwen2.5 14B DeepSeek R1 1M Uncensored

This is a 14B-parameter large language model based on Qwen2.5-14B-DeepSeek-R1-1M, fused with DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 using the TIES method

Large Language Model

Deepseek R1 Distill Qwen 32B Japanese

Japanese large language model released by CyberAgent, distilled and optimized based on Qwen-32B

Large Language Model Japanese

Gguf Jina Reranker V1 Tiny En

A model specifically designed for ultra-fast reranking, based on the JinaBERT architecture, supporting long text sequence processing (up to 8,192 tokens).

Text Embedding English

Deepseek R1 BF16

DeepSeek-R1 is an 8B parameter model based on the Llama architecture, developed by the DeepSeek team, focusing on efficient inference and fine-tuning.

Large Language Model

Transformers English

Koala Lightning 700m

KOALA-Lightning-700M is an efficient text-to-image model trained through knowledge distillation based on SDXL-Lightning, significantly improving inference speed while maintaining generation quality

Image Generation

Phi 2 Sft Ultrachat Full

A large language model fine-tuned on the ultrachat_200k dataset based on microsoft/phi-2, suitable for dialogue generation tasks.

Large Language Model

Transformers Other

Distil Medium.en

Distil-Whisper is a distilled version of the Whisper model, 6 times faster than the original, with a 49% reduction in size, while maintaining performance close to the original in English speech recognition tasks.

Speech Recognition English

Distil Large V2

Distil-Whisper is a distilled version of the Whisper model, achieving 6x speedup and 49% size reduction with only a 1% WER difference on out-of-distribution evaluation sets.

Speech Recognition English

MiniRBT is a Chinese small pre-trained model developed based on knowledge distillation technology, optimized for training efficiency using Whole Word Masking.

Large Language Model

Transformers Chinese

MiniRBT is a Chinese small pretrained model developed based on knowledge distillation technology, optimized for training efficiency using Whole Word Masking.

Large Language Model

Transformers Chinese

MiniRBT is a small Chinese pre-trained model based on knowledge distillation technology, combined with whole word masking, suitable for various Chinese natural language processing tasks.

Large Language Model

Transformers Chinese

Clip Vit Large Patch14 Ko

Korean CLIP model trained via knowledge distillation, supporting Korean and English multimodal understanding

Transformers Korean

Re2g Qry Encoder Fever

Re2G is a generative model combining neural initial retrieval and reranking for knowledge-intensive tasks. This question encoder is a component of the Re2G system, used to encode questions into vectors for retrieval.

Re2g Qry Encoder Nq

Re2G is an end-to-end system combining neural retrieval, reranking, and generation for knowledge-intensive tasks. This model serves as its Natural Questions (NQ) question encoder component.

Question Answering System

Distilbert Base Uncased Finetuned Squad

A model fine-tuned on Q&A datasets based on Distilled BERT Base, suitable for Q&A tasks

Question Answering System

Bert Large Uncased Squadv1.1 Sparse 80 1x4 Block Pruneofa

This model is obtained by fine-tuning a pre-trained 80% 1x4 block sparse Prune OFA BERT-Large model through knowledge distillation, demonstrating excellent performance on the SQuADv1.1 Q&A task.

Question Answering System

Transformers English

Minilm L6 H384 Uncased

This is a 6-layer lightweight version of microsoft/MiniLM-L12-H384-uncased, achieved by retaining every other layer for reduced size

Large Language Model

Tinybert L 4 H 312 V2

TinyBERT is a lightweight BERT model developed by Huawei Noah's Ark Lab, which compresses the model size through knowledge distillation while maintaining high performance.

Large Language Model

Minilmv2 L6 H384 Distilled From BERT Base

MiniLMv2 is a lightweight pre-trained language model introduced by Microsoft, achieving efficient inference through knowledge distillation technology.

Large Language Model

Dkrr Dpr Nq Retriever

FiD is a Q&A system model based on knowledge distillation, improving the efficiency of Q&A systems by distilling the knowledge from the reader model into the retriever.

Question Answering System

Tct Colbert V2 Hnp Msmarco

TCT-ColBERT-V2 is a dense retrieval model based on the tightly-coupled teacher mechanism and in-batch negative sample knowledge distillation, designed for efficient text retrieval.

Tct Colbert V2 Msmarco

TCT-ColBERT-V2 is a dense retrieval model based on knowledge distillation, which improves retrieval efficiency and quality through tightly coupled teacher mechanisms and batch negative optimization training.

Distill Bert Base Spanish Wwm Cased Finetuned Spa Squad2 Es

A Spanish Q&A model optimized via distillation from BETO, more lightweight and efficient than the standard version

Question Answering System Spanish

Tinybert Spanish Uncased Finetuned Ner

A named entity recognition model fine-tuned based on Spanish TinyBERT, with a size of only 55MB, suitable for entity recognition tasks in Spanish text.

Sequence Labeling Spanish

Distilbert Base Cased Distilled Squad

DistilBERT is a lightweight distilled version of BERT, with 40% fewer parameters, 60% faster speed, while retaining over 95% performance. This model is a question-answering specialized version fine-tuned on the SQuAD v1.1 dataset.

Question Answering System English

Distilbert Base Uncased Distilled Squad

DistilBERT is a lightweight distilled version of BERT, with 40% fewer parameters and 60% faster speed, maintaining over 95% of BERT's performance on the GLUE benchmark. This model is fine-tuned specifically for question answering tasks.

Question Answering System

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase